﻿ Curs 3 Tehnologii de nivel înalt aplicate textului – partea a 2-a Dan Cristea The pragmatics layer INITIAL text SYNTACTIC SUB-SYNTACTIC PROCESSING PROCESSING PROCESSING SEMANTIC DISCOURSE PRAGMATIC PROCESSING PROCESSING result PROCESSING USE OF AFFORDANCES LANGUAGE IN REAL CONTEXTS Poeticon++: roboții învață să utilizeze limbajul De la șiruri de litere la sintaxă Toss with spoon the chicken salad! șir de litere Toss with spoon the chicken salad ! cuvinte V PREP N DET N N PUNCT părți de vorbire V PP NP PUNCT grupuri V PP NP PUNCT arbore sintactic c instr d obj De la sintaxă la semantică toss with spoon the chicken salad sintaxă c instr d obj salad generalizare semantică From semantics to pragmatics semantics A salad can be tossed A spoon is a tool valid affordances A carrot can be cut A spoon cannot cut things pragmatics Cut with spoon the carrot! refused affordances iCub decides by itself how to solve a command Linguistics Linked Open Data (LLOD) a subfield of Natural Language Processing - Develop techniques able to decipher the semantic content of texts - narrative lines (e g what happens and when) - semantic relations between entities (e g genealogical trees, spatial and temporal relations) - statistics about entities (# mentions => salience, etc ) - summaries (general, focused on characters) 8 Linguistics Linked Open Data (LLOD) - Generation of ontologies from collections of scientific works - applications that “read” science books and formalize concepts and their instances - Intelligent documentary search - Personalized assistants of a research activity 9 Entity linking ò Challenges in entity linking: ò name variations ò ambiguities ò first mentions ò reference chains 10 Linking entities internally 11 Linking entities externally 12 I like traveling and reading… 13 Going out of the book… Çelebi Mh , Maç Sk, Beyoğlu, Turkey to Çukur Cuma Cd, Beyoğlu, Turkey - Google Maps10/3/1310/3/13 8:13 PMKatip Çelebi Mh , Maç Sk, Beyoğlu, Turkey to Çukur Cuma Cd, Beyoğlu, Turkey - Google Maps 8:13 PMKatip Directions to Çukur Cuma Cd, Beyo!lu, Turkey 400 m – about 4 mins Walking directions are in beta Use caution – This route may be missing sidewalks or pedestrian paths Katip Çelebi Mh , Maç Sk, Beyo!lu, Turkey" Çukur Cuma Cd, Beyo!lu, Turkey" 1 Head southwest on Maç Sk toward Baltacı Çkgo 75 m About 47 secstotal 75 m These directions are for planning purposes only You may find that construction projects, traffic, weather, or other events may cause conditions to differ from the map results, and you should plan your route accordingly You must obey all signs or notices regarding your route Map data ©2013 Basarsoft 2 Turn right onto Turnacıba"ı Cdgo 28 m total 100 m 3 Turn left onto A!a Külhanı Sk (Altıpatlar Sk )go 130 m About 2 minstotal 240 m 4 Continue onto Çukur Cuma Cdgo 150 m About 1 mintotal 400 m Page 2 of 2https://maps google com/maps?f=d&source=s d&saddr=Maç+Sokak,+I…,288 55,2 369,37 281,0&layer=c&ei=OqVNUp3mE8nTtAaWr4CgCQ&pw=2 Page 1 of 2https://maps google com/maps?f=d&source=s d&saddr=Maç+Sokak,+I…,288 55,2 369,37 281,0&layer=c&ei=OqVNUp3mE8nTtAaWr4CgCQ&pw=2 14 I need help to remember all kinship relations between 15 characters Characters in Forsyte Saga • The old Forsytes Ann, the eldest of the family Old Jolyon, the patriarch of the family, having made a fortune in tea James, a solicitor, married to Emily, a most tranquil woman Swithin, James's twin brother with aristocratic pretensions; a bachelor Roger, "the original Forsyte" Julia (Juley), a fluttery dowager; Mrs Septimus Small Hester, an old maid Nicholas, the wealthiest in the family Timothy, the most cautious man in England Susan, the married sister • The young Forsytes Young Jolyon, Old Jolyon's artistic and free-thinking son, married three times Soames, James and Emily's son, an intense, unimaginative and possessive solicitor, married to the unhappy Irene, who later marries Young Jolyon Winifred, Soames's sister, one of the three daughters of James and Emily, married to the foppish and lethargic Montague Dartie George, Roger's son, a dyed-in-the-wool mocker Francie, George's sister and Roger's daughter, emancipated from God • Their children June, Young Jolyon's defiant daughter from his first marriage; engaged to an architect, Philip Bosinney, who becomes Irene's lover Jolly, Young Jolyon's son from his second marriage; dies of enteric fever during the Boer Wars Holly, Young Jolyon's daughter from his second marriage, to June's governess Jon, Young Jolyon's son from his third marriage, to Irene, Soames's first wife Fleur, Soames's daughter from his second marriage, to a French Soho shopgirl Annette; Jon's lover; later marries a baronet, Michael Mont Val, Winifred and Montague's son; fights in the Boer Wars; marries his cousin Holly Imogen, Winifred and Montague's daughter • Others Parfitt, Old Jolyon's butler Smither, Aunts Ann, Juley and Hester's housekeeper Warmson, James and Emily's butler Bilson, Soames's housemaid 16 Prosper Profond, Winifred's admirer and Annette's lover 17 Bring back the book in the hands of children! ò What do youngsters keep in their hands in our times? ò Tablets ò Kendamas ò Books? MappingBooks – Get out of the book! Evade from the book in the virtual and real world! a project financed by the Romanian Ministry of Education and Research (July 2014 – September 2017) 19 Creating a more intimate link between the book and its reader ò Recognise in text mentions of locations ò Crawl the web for supplementary information ò Know where the reader is ò Point entities mentioned in the text that are in the reader’s proximity ò Trace them on maps ò Mix images with generated info 21 MappingBooks – a bird’s view MappingBooks ò A MappedBook is a book connected with locations/events in the virtual and real world and sensitive to the instantaneous location of the reader (as seized by the telephone/tablet) ò The information made available could possibly be different depending on the moment and the place of the reader 22 MappingBooks ò Multi-dimensional mash-ups combining textual and geographical data ò Spot book mentions of entities (persons and locations) and link them in the virtual world ò Make heavy use of entity linking techniques ò Easy to handle interface for young readers 23 The application 1) Connects mentions of entities (nominal groups) => one entity = a chain of coreferential mentions 2) The knowledge base does not include any apriory records about entities => starts from scratch 3) Identifies geographical relations (distances, positions, proximities, intersections, etc ) 4) Type of texts: geography manuals 24 Entity types in MB ò Type PERSON ò Type LOCATION ò Type ORGANISATION ò Type URL ò Type TIMEX Textual realisation of entities ò Syntactic realisation: NPs (proper nouns, common nouns, adjectives, complement PPs; but NO relative clauses) ò Characterised by distinctive heads ò [the house on the [mountain]] ò If intersected è imbricated ò [the museum [Grigore Antipa]] Processing features • The capacity to see a text different than a string of letters – sentence splitting – tokenisation – POS-tagging – lemmatisation – NP chunking – anaphora resolution TEXT ANALYTICS Processing features ò Know who’s who ò recognise names and types ò disambiguate names ò recognise an entity in the text even if mentioned by a common noun or a pronoun ò use an ontology of types NAME ENTITY RECOGNITION Processing features ò What virtual world entities are mentioned in the book? ò link textual mentions of entities in the virtual world ò decide what virtual info would be relevant to user ò employ multiple sources ENTITY CROWLING Processing features ò Fetch, process and make use of geo-data ò Geographic Information Systems (GIS) ò geographic layers GEOGRAPHY Processing features ò Trace on maps spatial relations as described in the book ò detect spatial relations in text ò use Google Maps-like geo-strata (actually we procured our own maps) ò trace locations and paths on maps RELATIONS DETECTION MAPS&TRAJECTORIES Processing features ò Know where I am ò What real world entities are in my proximity? ò locate the position of the user (GPS) ò compute distances between real places and those mentioned in the book ò signal “interesting” locations in proximity DEVICE INFO Processing features • Mix images with generated info ò sense the orientation of the camera (compass) ò process images => segment, contour, recognition ò decide info to be displayed AUGMENTED REALITY Processing features ò Attractive user interface ò analyse use cases ò design different dedicated user interfaces ò accommodate on the screen a segment of text, a map, user’s position, web info, etc INTERFACES Processing features ò Client-server ò user’s Portrait ò the databases ò standards and communication protocols CLIENT-SERVER Other issues… • RESOURCES – find texts – clear IPR – perform annotation – find other relevant linguistic and geographic data TA = Text Analytics NER = Name Entity Recognition AR = Augmented Reality EC = Entity Crowling DEV = Device Info RD = Relations Detection INT = Interfaces GEO = Geography RES = Resources M&T = Maps and Trajectories M&E = Management and Evaluation 37 What else could be added? Networking Readers ò Using semantic and geographical links to form social communities of readers ò if “books subscripted for“ declared visible => ò co-readers of B (book) ò if “instantaneous location” also declared visible => ò co-readers of B AND actual co-proximity of L (location) ò co-readers of B AND co-track of T (trajectory) 38 Networking Readers: enhance eBooks reading experience ò Easy to imagine other ways to form communities rooted in readings ò intersect common readings and attended places with levels of friendship reported by other social media, like Facebook or Twitter ò real-world events and entities mentioned in a book associated with real- world locations and particular moments of the year/day ò portraying the user (from accessible social media and habits of MB behavior) and matching 39 Usage examples - I visit a city with the traveling guide in my hand - places of interest, routes, are reordered depending on my instantaneous position 40 Usage examples - I am a school boy, in the train going from Brașov to Sibiu… - if I open my tablet and head it towards the left side window of the train, I will see arrows showing the picks of the Făgăraș mountains, exactly as in the Geography manual 41 Augmented reality Usage examples - I am in Paris for the 3rd time… - but only now my MB Lonely Planet guide signals me this temporary exhibition opened in the Pyramid 43 Către… cărțile vii ò Artefacte multidimensionale care combină date textuale, geografice, temporale etc ò subliniază mențiunile despre persoane, locații ò legături care exploatează: ò contextul menționării în carte ò locația cititorului ò momentul lecturii ò personalitatea și preferințele cititorului 44 References MappingBooks ò M Colhon, D Cristea, D Gîfu (2016) Discovering Semantic Relations within Nominals in D Trandabăț and D Gîfu (eds ): Proceedings of the Workshop on Social Media and the Web of Linked Data, RUMOUR-2015, A satellite event of EUROLAN-2015, Sibiu, Romania, July 2015, Springer International Publishing ò D Cristea, D Gîfu, M Colhon, P Diac, A -D Bibiri, C Mărănduc, and L -A Scutelnicu (2015) Quo Vadis: A Corpus of Entities and Relations In N Gala, R Rapp and G B Enguix (eds ): Language Production, Cognition, and the Lexicon, Springer International Publishing Switzerland ò D Cristea, D Gîfu, I -C Pistol, D Sfirnaciuc, M Niculiță (2016) A Mixed Approach in Recognising Geographical Entities in Texts in D Trandabăț and D Gîfu (eds ): Proceedings of the Workshop on Social Media and the Web of Linked Data, RUMOUR-2015, A satellite event of EUROLAN-2015, Sibiu, Romania, July 2015, Springer International Publishing ò D Cristea, Ș -G Pentiuc (2016) A New eBook Concept and Technology Dedicated to Geographical Information, 18-th International Conference on Scientific Research & Education in the Air Force – AFASES, Brașov, May ò D Cristea, I -C Pistol (2014) MappingBooks: Linguistic Support For Geographical Navigation Systems In M Colhon, A Iftene, V Barbu Mititelu, D Cristea, D Tufiş (eds ) (2014) Proceedings of the 10th International Conference "Linguistic Resources And Tools For Processing The Romanian Language, Craiova, 18-19 September 2014", „Alexandru Ioan Cuza” University Publishing House ò D Cristea, I -C Pistol, D Gîfu and D Anechitei (2016) Networking Readers: Using Semantic and Geographical Links to Enhance e-Books Reading Experience, in Proceedings of the 2nd Workshop on Social Media and the Web of Linked Data, RUMOUR 2016, together with the 8th ICCCI, September 28-30 2016, Halkidiki, Greece ALPE (Automated Linguistic Processing Environment) • Un sistem cadru de procesare a limbajului natural – Determinarea automată a formatului, tipului de adnotare şi a limbii unui text – Integrarea unor module de procesare lingvistică într-o ierarhie de formate – Calculul automat al unor lanţuri de procesare plecând de la un document de intrare până la un format de ieşire specificat • Rezultate (disponibile pe METASHARE): • modul de determinare automată a formatului unui document XML • modul de comparare/conversie automată a unor documente XML • modul de combinare automată a două adnotări XML peste acelaşi text Ionuț Pistol (2011) The Automated Processing of Natural Language, Ph D thesis, “A I Cuza” University of Iași txt Exemplu de lanţ de procesare: Tokenizer-UAIC un sistem de construire a arborilor de discurs tok POS-RACAI ò txt: basic text document pos FDGparser-UAIC Splier-UAIC ò tok: xml with marked lexical tokens ò pos: xml with marked part-of-speech seg information FDG ò FDG: FDG trees for each phrase NPchunker-UAIC RARE-UAIC ò NP: xml with marked Noun Phrases NP ò seg: xml with marked clauses (segments) ò RARE: xml with marked coreference chains RARE-UAIC (output of the RARE anaphora resolution engine) RARE DP-UAIC ò DT: discourse trees of the original texts One or more trees are produced DT U-Compare U- COMPAR Partner Data/Soft Where Delivery Designation E Data Batch 1 1984 NP NO 4 UAIC - University Endogenous 1984AnaphoraRo NO Alexandru Ioan Cuza resources FrRoMWE NO QA-corpus-UAIC YES RO-FDGBank YES RO-FN YES RoSemClass YES TE-pairsResource-UAIC NO ò Instrumente de TE-rules NO Total Batch 1 Batch 2/3 eDTLR-sources procesare UAIC RomMorph-UAIC Total Batch 2/3 Total Endogenous resources Restricted Batch 2/3 DEA exogenous resources DLPE eDTLR RoWN-eDTLR SRoL – Sounds of the Romanian Language Total Batch 2/3 Total Restricted exogenous resources Total Data Endogenous Software resources Batch 2/3 ALPE-UAIC NO AnaMorph-UAIC NO Categorizer-UAIC YES Diacritics-UAIC YES DP-UAIC YES FDGparser-UAIC YES Language identifier-UAIC NO Lemmatizer-UAIC YES NP-chunker-UAIC YES Occurrence Finder-UAIC YES OntologyBuilder-UAIC YES QA-UAIC YES RARE-RO-UAIC YES Splitter-UAIC YES SRL-UAIC YES Summarizer-UAIC YES TE-UAIC YES Tokenizer-UAIC YES Total Batch 2/3 Total Endogenous resources Restricted Batch 2/3 ANNIE NO exogenous resources Total Batch 2/3 Total Restricted exogenous resources Total Software Total 4 UAIC - University Alexandru Ioan Cuza Resources 1 - RACAI:Lang Idenﬁer 2 - UOM:Paragraph Breaker:Any 3 - UOM:Sentence Splier:Any 4 -UNIMAN:Genia Sentence Splier: en 5- UNIMAN:OpenNLP sentence detector: en 6 - UNIMAN:NaCTeM sentence breaker:en 7- RACAI: Sentence Splier:ro,en 8 - UNIMAN:Genia Tagger (with tokenizaon): en 9 - UNIMAN:Stepp Tagger (with tokenizaon): en 10 - UNIMAN:Genia Tagger (no tokenizaon): en 11 - UNIMAN:Stepp Tagger (no tokenizaon): en 12 - UNIMAN:OpenNLP tokenizer:en 13 - RACAI:TTL Tokenizer:ro,en 14 - UAIC: TokenizerUAIC: ro, en 15 - UNIMAN: Aperum Morpho Analyser: en,ro 16 - UNIMAN:OpenNLP Tagger:en 17 - RACAI:TTL Tagger:ro,en,fr Key 18 -UAIC: FDG-Parser-UAIC:ro Lang – Language of text 19 - RACAI: TTL Lemmazer: ro,en Txt – Plain text 20- UAIC: Lemmazer-UAIC: ro Para – Paragraph annotaons 21 - UNIMAN:morpha:en Sent - Sentence annotaons 22 - UAIC: Splier-UAIC:ro Tok – Token annotaons 23 – UAIC:NP-Chunker-UAIC:ro POS – Part-of-speech annotaons 24 – RACAI:TTL-Chunker:ro,en Lem – Lemma annotaons Seg – Segment annotaons FDG- FDG parse annotaons NP – Noun phrase annotaons 